Mining Compressing Patterns in a Data Stream

نویسندگان

  • Hoang Thanh Lam
  • Toon Calders
  • Jie Yang
  • Fabian Mörchen
  • Dmitriy Fradkin
چکیده

Mining patterns that compress the data well was shown to be an effective approach for extracting meaningful patterns and solving the redundancy issue in frequent pattern mining. Most of the existing works in the literature consider mining compressing patterns from a static database of itemsets or sequences. These approaches require multiple passes through the data and do not scale up with the size of data streams. In this paper, we study the problem of mining compressing sequential patterns from a data stream. We propose an approximate algorithm that needs only a single pass through the data and efficiently extracts a meaningful and non-redundant set of sequential patterns. Experiments on three synthetic and three real-world large-scale datasets show that our approach extracts meaningful compressing patterns as the state-of-the-art multi-pass algorithms proposed for static databases of sequences. Moreover, our approach scales linearly with the size of data streams while all the state-of-the-art algorithms do not.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CASW: Context Aware Sliding window for Frequent Itemset Mining over Data Streams

In recent years, advances in both hardware and software technologies coupled with high-speed data generation has led to data streams and data stream mining. Data generation has been much faster in data stream applications and scores of data is generated in quick turnaround time. Hence it becomes obvious to perform mining, data on arrival that is usually termed as data stream mining. General fre...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Mining Compressing Sequential Patterns

Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compress...

متن کامل

Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of m...

متن کامل

Incrementally Mining Recently Repeating Patterns over Data Streams

Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Based on time sensitive concern, mining repeating patterns from the whole history data sequence of a data stream does not extract the current trend of patterns in the stream. Therefore, the tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013